REMARKS 

The Office Action of March 27, 2006 has been carefully 
considered. 

The title has been amended as suggested in the Office 
Action, and claims 15, 21, 31, 35, 40 and 41 have been amended 
to correct the improper use of periods within the claims. 
Claim 42 has been canceled, and claim- 35 has been amended to 
correct the terminology. 

Claims 1-4, 9, 11-13 33 and 34 have been rejected under 
35 USC 102(b) over DeHon. 

The invention is directed to devices of dynamically 
reconf igurable architecture, which has a number of advantages 
over the prior art: 

The reconf igurable architecture of the invention is 
constructed from a library of controller elements and 
processing elements, which allows integrated circuit designers 
to design dynamically reconf igurable devices to target 
specific application areas. This is described in the 
specification, and enables different devices to be optimized 
for different applications (see claims 36, 37 and 38) . 

The processing elements of the invention can be 
concatenated (i.e. any output can be routed to any input of a 
processing element), as is now claimed in claim 1. This allows 
different functional datapaths to be formed. More importantly, 
these datapaths can be formed on clock cycle by clock cycle 
basis, allowing many operations to be performed per clock 
cycle on the same source data. This approach is very different 
from those cited in the prior art. The advantages to this 
approach are that the clock rate is reduced, there is no 
requirement for pipeline registers for storing intermediate 
results, the area is therefore reduced, power consumption is 
reduced and teting is easier. It also allows the same 
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processing elements to be dynamically reconfigured on clock 
cycle by clock cycle basis to implement different functions. 
Because the claimed processing elements include basic 
functions, it is easy to "construct" more complex functions 
required by many applications. This makes our architecture 
more flexible and commercially viable as it can target many 
applications 

The claimed controller elements are just that: elements. 
They are self-contained / dedicated logic circuits and not 
formed by combining groups of elements. 

An "out of sequence" processing architecture can be 
implemented according to the invention, as the processing 
resources can be shared with many controller elements and 
timestamp/source address mark the results so they can be 
automatically directed to the source controller and re- 
ordered. All other processors, including those cited in the 
Office Action are all sequential processors, use the same . 
global clock and require data to process in sequence. This is 
inefficient as many algorithms require out of sequence 
algorithms. Our architecture allows "out of sequence" 
processing due to the multiple clocks and dynamic / 
statistical use of the available resources. 

The implementation of Finite State Machines (FSMs) of the 
invention is different in that they are based on dynamically 
reconf iguarble logic. The invention creates the logic to 
implement the next state (s) as is required from the available 
logic. The invention does not implement a very large PLA and 
all the circuity to implement each and every state at device 
initialization. In fact, by definition, the FSMs and 
combinaturial logic implemented by the PLA of the cited art is 
fixed and not dynamically reconf igurable . 

In addition, the claimed architecture allows a plurality 
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of processing elements to be accessed/used by a plurality of 
controller elements as the processing elements are clocked at 
a high rate that is proportional to the number of controller 
elements. It also allows the accessing of the processing 
elements by the various controller elements to be performed on 
a statistical basis dependent on the application. This is 
another novel feature of the claimed dynamically 
reconfiguarble architecture. One of the advantages to this 
approach is that is allows the same processing elements to be 
used for different control or processing flows. This 
significantly reduces the overall device area and cost. 

In the claimed architecture, the controller memories can 
be dynamically updated with new code (from the master 
controller) to implement new functions rather than implement 
seprate functions in logic. This, therefore, reduces the size 
and cost of the reconfiguarble device. In the prior art, 
devices are configured once at initialization, and there is no 
real time dynamic reconfiguration of the controller memories. 

In the claimed architecture, both controller elements and 
processing elements can be implemented using asynchronous 
logic. This is significantly different to any of the cited 
prior art documents. In fact, asynchronous logic is not 
mentioned in any of the cited prior art. 

In the claimed architecture, the routing overhead is 
significantly reduced. The routing required in the cited art 
is very large and would cause many signal integrity problems 
and device layout problems. The size overhead would 
necessitate more area dedicated to routing than to actual 
processing logic, and this would be inefficient in terms of 
silicon usage and very expensive to fabricate and very 
unwiedly, if not totally impractical. 

In the claimed reconf igurable architecture, each 
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processing element (assuming an 8-bit processing element) 
would only require, on average, 10 x 8-bit buses, representing 
a significant reduction over the art. 

The processing elements of the invention are fixed or 
hardwired or dedicated circuits that implement a specific 
function. This is different from non-hardwired BFUs of DeHon; 
DeHon is trying to have one BFU that does everything. 
Unfortunately, this approach is very wasteful of silicon real 
estate, increases power dissipation, routing overheads, 
testing circuitry and testing time and most importantly the 
overall device cost. 

DeHon describes a device architecture which is based 
around a two dimensional array of Basic Functional Units or 
BFUs (100). All the BFUs are identical (see Col. 5, line 7, 
Col. 9, line 6) and a BFU is the smallest logic unit (Col. 9, 
line 6) from which more complex processing units can be built. 
There are many disadvantages with this architecture. First, 
there is a huge area overhead. Each BFU must contain all the 
circuitry required to perform any function, on the off chance 
that it might be required. (This is also a disadvantage to 
similar array procesors, such as those described in the cited 
references to Mirsky and Rupp.) A BFU can be both a datapath 
unit or part of a control unit. As outlined in Figure 1 and 
Col. 5, lines 12-23 of DeHon, BFUs are programmed to implement 
specific functions. For example, a whole BFU is programmed to 
implement a Program Counter (PC) of a control unit. The logic 
and hence the silicon area overhead is therefore massive. A 
Program Counter is just a basic counter. The same is true for 
all the other programmed versions of a BFU. If it is 
programmed to be a memory unit, then the rest of BFU circuitry 
is wasted. Each BFU is trying to be a "jack" of all "logic 
trades" and is a master of none. This is a gross waste of 
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silicon resources and it is difficult to justify this approach 
commercially. It also seems to be a hit and miss approach, and 
the logic that isn't being used directly, will probably be 
dissiapating power. This architecture would not be good for 
low power portable applications. Moreover, a BFU is a general 
purpose unit and not optimized for a particular application. 
Consequently, much of the logic is used to input and output 
signals, and hence adds to the number of levels of logic a 
signal must pass to get between BFUs. This therefore degrades 
the performance of the architecture and adds to the path 
delays . 

BFUs are very limited. For example, they take several 
cycles to implement a multiply. This is a serious disadvantage 
as most Digital Signal Processing (DSP) algorithms rely 
heavily on multiply-accumulate operations. While single cycle 
multipliers were available at the time of DeHon, they are not 
disclosed; instead, the architecture is designed around a 
single general purpose BFU. 

The Office Action takes the position that elements F, A, 
B of DeHon form a plurality of controller elements, as is 
presently claimed. 

The term "element" is intended to apply to one of the 
fundamental or irreducible components making up the whole. 

DeHon (Col. 9, lines 6-8) states that, a BFU cell 100, is 
the smallest logic unit from which more complex processing 
units can be built. Also, several BFUs need to be combined 
(Col. 5, lines 11-23) to form a basic controller circuit. In 
fact, at least two BFUs are required to form a controller unit 
as one of the BFUs needs to act as Program Counter (PC) . Col. 
5, lines 22-23, specifically states that, "a final BFU, PC, 
operates as a program counter for the various instruction 
BFUs, F, A, B . " By definition, the DeHon controller 102 can 
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not be a controller element because it is constructed from 
several basic elements or BFU units. 

The Office Action also takes the position that DeHon 
discloses two different control schemas. In fact, DeHon 
discloses a fast reduction operation and distributed PLA 
control scheme (Col. 11, line 46 to Col. 12 line 67). Again, 
by definition, if the control logic is distributed over many 
BFUs the controller that' is formed is not a controller 
element. It is control unit or function. 

The present claims recite "controller element" and 
"processing element" to emphasize they are single dedicated 
logic units and are not constructed from combining other 
units. This is also explained in the specification, and 
presents a clear distinction over DeHon. There are many 
disadvantages to the DeHon control scheme. First, it is 
limited (Col. 12, lines 28-31; Col 12, lines 65-67) states 
this. It requires several clock cycles to implement an 
instruction (Col. 12, lines 59-62). Note, the controllers of 
the invention can implement a instruction in a single clock 
cycle and are therefore much more efficient in terms of 
throughput, less silicon real estate, lower power dissipation, 
reduced testing overheads and hence reduced device costs. 

In addition, the code required to implement any function 
in the DeHon device is loaded during initialization. This 
makes for an inefficient use of hardware and a larger device 
because all the required functions must be implemented at 
initialization time. By dynamically updating the controller 
memories with different program code according to the 
invention, the same controllers and processing elements can be 
dynamically reconfigured to implement different algorithms and 
or functions. Consequently, the claimed architecture results 
in a much smaller device and a less expensive device. 
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The routing described by DeHon is heirarchical and 
pipelined (Col. 7, lines 20-24, Col. 8, lines 41-51, Col. 12, 
lines 14-15, Col. 12, lines 65-67). Consequently, it can take 
several cycles for both control and data signals to propagate 
to their destination (Col. 8, lines 64-67) . Again, this is a 
very inefficient architecture because the greater delays due 
to pipelining mean that all signals need to be delayed by the 
same amount, otherwise data arrives out of synchronization 
with other data. This extensive pipelining requires more 
memory and registers, adding to the logic area, power 
dissipation and overall cost of the device. More importantly, 
it also adds to the testing and debugging of the device as 
there is more logic to test. This would then require more test 
logic and add the already large overheads of the device. 

Because the DeHon architecture is pipelined, BFUs can ! t 
be cascaded to form combinatorial datapaths allowing many 
instructions to be performed in a single cycle on the same 
source data, as is claimed. DeHon requires many cycles to 
clock data through a pipeline to obtain a result. This 
requires many intermediate storage operations and hence 
contributed to the power dissipation. 

The control of the DeHon device is also complex, limited 
and unwieldy (Col. 12, lines 65-67), making it difficult to 
program. Any associated software tools would be overly complex 
as they would need to test all possible signal routings. 

The routing required by DeHon, for example, is huge and 
would cause many signal integrity problems and device layout 
problems. The size overhead would mean there was more area 
dedicated to routing than to actual processing. Consequently, 
the DeHon architecture would be inefficient in terms of 
silicon usage and very expensive to fabricate. For example, 
each 8-bit BFU requires 8 x (30 x 8-bit) buses or 240 x 8-bit 
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buses per BFU . This is huge and very unwieldy, if not totally 
impractical. The situation is similar for the other cited 
prior art. 

In the claimed reconf igurable architecture, each 
processing element (assuming an 8-bit proceesing element) 
would only require 10 x 8-bit buses, a significant reduction. 

The Office Action states that the BFUs are "being 
hardwired to perform either mutiply or add functions, thus 
defining a rigid architecture'' . However, the basis for the 
DeHon device to be reconf igurable is an array of programmable 
BFUs. These must therefore be reconf igurable or programmable. 
If they can implement several different functions then they 
can't be hardwired and hence they don't form a rigid 
architecture . 

In DeHon, a BFU is a single general purpose unit that can 
be programmed to implement one of several functions. In 
several of the points in the Office Action, it is stated that 
"DeHon shows a plurality of functional units (elements 512, 
514, and 516) ". Applicant disagrees with this argument, as a 
BFU is a single functional unit programmed to implement 
different functions, which is completely different to 
plurality of processing elements with different architectures. 
In fact, there is no description in the patent document to 
describe the functionality of the blocks labeled 512, 514 and 
516. However, they are all BFUs and are therefore, by default 
a single architecture and not a plurality of architectures, as 
is presently claimed. 

The Office Action also states in several places that the 
BFU or combination of BFUs are hardwired. This is not the 
case. A BFU can perform different operations and is therefore 
not fixed or a dedicated processing element. The term 
"hardwired" means not changeable and is used in technical 
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descriptions to denote a lack of any reprogramming/ 
switching/selection mechanisms; electronic circuitry that is 
designed to perform a specific task, and in software, a 
function or capability that is hardcoded (programmed) into a 
system. Generally, "hardwired" means anything that can not be 
modified or customized. 

In the present application, processing elements have been 
specifically described that are hardwired or fixed functions, 
such as multipliers, barrel shifters, etc. and are therefore 
completely different from the DeHon BFUs. One reason for this 
approach is that the invention is much more efficient and 
doesn't waste silicon resources as in the DeHon architecture. 

Also at page 5, point 14, the Office Action states that 
an ALU is dynamically configured to perform operations 
dependent on an instruction. The ALU in question isn't 
dynamically configured, it is just decoding different inputs 
(instructions). Those skilled in the art will know that this 
is not dynamically reconf igurable logic. 

Withdrawal of this rejection is requested. 
Claims 5-8, 18, 30-32, 39-42 and 45 have been rejected 
under 35 USC 103(a) over DeHon in view of Master et al. 

Master et al has been cited for a disclosure of rigid 
architectures, not disclosed by DeHon. 

Master et al describes an integrated circuit which 
employs rigid hardware elements which can be reconfigured in 
real time. However, there are several disadvantages to this 
approach. Firstly, each computation unit comprises several 
different rigid computational elements and a single 
computational unit controller. A plurality of computation 
units is used to form a matrix, which is then replicated many 
times to form an array of matrices. This is an inefficient use 
of hardware resources as the computational unit controller 
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will only be using one of the plurality of computational 
elements depending on the algorithm be implemented. Therefore, 
the hardware utilization can be low. Secondly, the 
computational unit controller can only access the 
computational elements in its own computation unit. There is 
no sharing of resources by different computational unit 
controllers. Again, this is inefficient. Thirdly, the same 
computational elements and matrices are repeated across the 
integrated circuit to form a large array. There is no grading 
of reconf igurable resources across the integrated circuit in 
relation to the processing and resource requirements for 
different functions used to implement a system, such as input 
interfaces, output interfaces, parallel processing and 
protocol processing and data formatting. 

Master employs semi-dedicated resource blocks (Figures 5A 
-5E) in a half-way house scheme. These blocks consist of 
hardwired individual adders, subtractors, multipliers and the 
like. However, their functionality is limited as there are 
dedicated hardwired connections forming fixed blocks. 
Consequently, the individual components can't be concactenated 
to for different functional data paths as is the claimed 
architecture. Master was looking to implement a dynamically 
reconf iguarble device, but didn't provide a very versatile 
reconf iguarble device architecture as is claimed. There are 
several disadvantages to the Master reconf igurable 
architecture. For example, having groups of four separate 
dedicated blocks is inefficient as the circuitry is actually 
being utilized for any function. The more dedicated the less 
flexible it is in being useful in different applications. 
There will come a point where it may be best to design a 
dedicated device rather than a reconf igurable device because 
the reconf igurable architecture is either not flexible and 
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there is too much routing and reconf igurable overheads. 

Thus, any disclosure of rigid architecture by Master does 
not cure the defects of DeHon, and withdrawal of this 
rejection is requested. 

Claims 10, 14-17, 19-21 and 36-38 have been rejected 
under 35 USC 103(a) over DeHon in view of Rupp, which has been 
cited for a disclosure of a plurality of controller elements 
and processing elements grouped to form a shared resource 
processing block. 

Rupp discloses (see Col. 8, line 63 to Col. 9, line 2) a 
programmable logic structure called the Adaptive Logic 
Processor (ALP) . This structure is optimized for the 
implementation of program specific pipeline functions, where 
the function may be changed any number of times during the 
progress of computation. 

ALP is only part of the device architecture in a general 
purpose array and is therefore not optimized for any 
application. Moreover, the ALP (see Col. 10 lines 44 - 57 & 
Col. 14, line 50 to Col. 15, line 64) consists of three layers 
which perform specific functions. The base layer for the "core 
cell 150" is a two dimensional array of core cell logic 150. 
From the description of the ALP it can be seen that the ALP is 
not optimized for any application. 

The processing elements and controller elements of the 
invention are selectable at design time from a family of 
dedicated fixed processing elements that are optimized for a 
particular group of applications, for example, audio, video, 
communications. This distinguishes the invention from the 
prior art. 

Withdrawal of this rejection is requested. 
Claims 25 and 43-44 have been rejected under 35 USC 
103(a) over DeHon in view of Rupp and Master. These 
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references have all been discussed in detail above, and 
withdrawal of this rejection is requested. 

Claims 23-24, 26-29 and 35 have been rejected under 35 
USC 103(a) over DeHon in view of Mirsky et al, cited to show 
signal routing controlled by one or a plurality of 
reconf igurable controllers or controller elements. 

What Mirsky actually discloses is an 8-bit broadcast bus 
with a single source and multiple receivers (see Col. 6, lines 
64 - Col. 7, line 28). By definition, this cannot be a 
plurality of controllers controlling the signal routing. 
Firstly, there is only one source of control and secondly it 
is a single bus in which the receivers just decode simple 
messages. This has nothing to do with dynamic reconfiguration 
to perform different processing functions as is claimed. 

It is further alleged that Mirsky et al discloses "the 
master controller being formed from one or a plurality of 
processing blocks.'' Again, Applicant disagrees; there is no 
mention of a master controller means in the cited paragraph. 

Mirsky employs a CNS to store configuration data. In 
fact, distributed memory is used to store several 
configuration contexts. However, they are not loaded in real 
time into the controller memories to implement different 
processing functions. The disadvantage of Mirsky et al is that 
large memories are required to hold all the required 
configuration contexts. The advantage of the claimed invention 
is that smaller memories are employed that hold one or a small 
number of contexts and dynamically loaded new contexts as they 
are required. This involves the controller elements 
dynamically interacting with the master controller. 

This is a significant difference from the operation and 
architecture described by Mirsky et al. 

It has also been alleged that Mirsky et al discloses "the 
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master controller being an embedded processor" (see Col. 4, 
lines 45-47). Again, this is not the case. There is no 
mention of master controller means in the cited paragraph, 
only configurable memory blocks. 

Withdrawal of this rejection is requested. 

Claim 22 has been rejected under 35 (JSC 103(a) over DeHon 
in view of Rupp and Mirsky et al. These references have been 
discussed in detail above, and withdrawal of this rejection is 
requested. 

Finally, Applicant notes that none of the cited prior art 
mentions the use of asynchronous logic. There is a very good 
reason for this, as those familiar with the art understand 
that asynchronous logic design is extremely complex and 
difficult to implement, especially on a VLSI scale. It is very 
different from synchronous logic design and has only recently 
been used to implement microprocessor type processing devices. 
As an asynchronous architecture is completely different from 
the architecture of the cited prior art, Applicant submits 
that the claimed asynchronous reconf igurable logic 
architecture is unique. 

Again, implementing floating point logic is much more 
complex than implementing fixed point logic and is only used 
for specific applications. Employing dynamically 
reconf igurable logic to implement these functions is a great 
advantage as the claimed dynamically reconf igurable device can 
be configured to implement both fixed and floating point logic 
using the same processing elements. This makes the device more 
flexible and reduces the cost of implementing both types of 
logic on the same device. Thus, the claimed floating point 
architecture is novel over the cited prior art, which does not 
mention floating point. 
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In view of the foregoing amendments and remarks, 
Applicant submits that the present application is now 
condition for allowance. An early allowance of the 
application with amended claims is earnestly solicited 

JBsspectfully submitted, 




Ira J. Schultz 
Registration No. 28666 



